{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "47d72ea2",
   "metadata": {},
   "source": [
    "# Swin Transformer 总结\n",
    "\n",
    "## 1. 历史背景和解决的问题\n",
    "Swin Transformer 是在视觉 Transformer（ViT）的基础上改进而来的。早期 ViT 直接将图像划分为固定大小的块并全局应用自注意力机制，虽然在分类任务中表现优异，但存在两个核心问题：\n",
    "- **计算复杂度高**：全局自注意力的计算量与图像尺寸成平方关系，难以处理高分辨率图像。\n",
    "- **缺乏层次结构**：ViT 的单一特征图无法像 CNN 一样通过多尺度卷积提取层级化特征，限制了其对多尺度目标的感知能力。\n",
    "\n",
    "Swin Transformer 通过引入 **滑动窗口机制** 和 **层次化特征图设计**，解决了上述问题，同时保留了 Transformer 的全局建模优势。\n",
    "\n",
    "---\n",
    "\n",
    "## 2. 模型的创新性和影响\n",
    "\n",
    "### 创新点\n",
    "- **滑动窗口注意力（Shifted Window Attention）**  \n",
    "  将图像划分为不重叠的局部窗口（如 7×7），仅在窗口内计算自注意力，大幅降低计算量（复杂度从 $ O(n^2) $ 降为 $ O(n) $）。通过 **交替移位窗口**（Shifting Operation）实现跨窗口交互，既保持局部性又增强全局感知。\n",
    "\n",
    "  ![alt text](resources/swin_transformer_sliding_window.png \"Title\")\n",
    "\n",
    "- **层次化特征图（Hierarchical Feature Maps）**  \n",
    "  通过逐步合并图像块（如 2×2 到 1 个块）实现多尺度特征提取，输出不同层级的特征图，支持目标检测、分割等需要多尺度的任务。\n",
    "\n",
    "- **相对位置编码（Relative Position Bias）**  \n",
    "  在窗口内引入可学习的相对位置偏置，弥补局部窗口丢失的绝对位置信息，提升模型对空间关系的建模能力。\n",
    "\n",
    "### 影响与意义\n",
    "- **通用视觉骨干网络**  \n",
    "  Swin Transformer 成为首个在分类、检测、分割等视觉任务中全面超越 CNN 的通用 backbone，推动了 Transformer 在 CV 领域的普及。\n",
    "  \n",
    "- **性能与效率平衡**  \n",
    "  在 ImageNet 等数据集上取得 SOTA 同时，其线性复杂度设计使模型能高效处理高分辨率图像，被广泛应用于超分辨率、视频分析等领域。\n",
    "\n",
    "- **启发后续研究**  \n",
    "  其“局部-全局”混合建模思想影响了后续模型（如 ConvNeXt、Focal Transformer），并促进了 CNN 与 Transformer 的融合研究。\n",
    "\n",
    "  ![alt text](resources/swin_transformer.png \"Title\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "84c7a5c5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Use device:  cuda\n"
     ]
    }
   ],
   "source": [
    "# 自动重新加载外部module，使得修改代码之后无需重新import\n",
    "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from hdd.device.utils import get_device\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "from torchvision import datasets, transforms\n",
    "\n",
    "# 设置训练数据的路径\n",
    "DATA_ROOT = \"~/workspace/hands-dirty-on-dl/dataset\"\n",
    "# 设置TensorBoard的路径\n",
    "TENSORBOARD_ROOT = \"~/workspace/hands-dirty-on-dl/dataset\"\n",
    "# 设置预训练模型参数路径\n",
    "TORCH_HUB_PATH = \"~/workspace/hands-dirty-on-dl/pretrained_models\"\n",
    "torch.hub.set_dir(TORCH_HUB_PATH)\n",
    "# 挑选最合适的训练设备\n",
    "DEVICE = get_device([\"cuda\", \"cpu\"])\n",
    "max_epochs = 300\n",
    "print(\"Use device: \", DEVICE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1edd5bcd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n",
      "Files already downloaded and verified\n"
     ]
    }
   ],
   "source": [
    "from hdd.data_util.auto_augmentation import CIFAR10Policy\n",
    "\n",
    "# 训练超参数和数据增强来自 https://github.com/omihub777/ViT-CIFAR\n",
    "CIFAR_10_MEAN = [0.4914, 0.4822, 0.4465]\n",
    "CIFAR_10_STD = [0.2470, 0.2435, 0.2616]\n",
    "BATCH_SIZE = 128\n",
    "\n",
    "val_transform = transforms.Compose(\n",
    "    [\n",
    "        transforms.ToTensor(),\n",
    "        transforms.Normalize(CIFAR_10_MEAN, CIFAR_10_STD),\n",
    "    ]\n",
    ")\n",
    "\n",
    "val_dataloader = torch.utils.data.DataLoader(\n",
    "    datasets.CIFAR10(\n",
    "        root=DATA_ROOT, train=False, download=True, transform=val_transform\n",
    "    ),\n",
    "    batch_size=BATCH_SIZE,\n",
    "    shuffle=False,\n",
    "    num_workers=8,\n",
    "    pin_memory=True,\n",
    ")\n",
    "\n",
    "train_transform = transforms.Compose(\n",
    "    [\n",
    "        transforms.RandomCrop(size=32, padding=4),\n",
    "        transforms.RandomHorizontalFlip(),\n",
    "        CIFAR10Policy(),\n",
    "        transforms.ToTensor(),\n",
    "        transforms.Normalize(CIFAR_10_MEAN, CIFAR_10_STD),\n",
    "    ]\n",
    ")\n",
    "\n",
    "train_dataloader = torch.utils.data.DataLoader(\n",
    "    datasets.CIFAR10(\n",
    "        root=DATA_ROOT, train=True, download=True, transform=train_transform\n",
    "    ),\n",
    "    batch_size=BATCH_SIZE,\n",
    "    shuffle=True,\n",
    "    num_workers=8,\n",
    "    pin_memory=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "567bca4c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Parameter: 4560796\n",
      "Epoch: 1/300 Train Loss: 2.3166 Accuracy: 0.1005 Time: 14.07742  | Val Loss: 2.3194 Accuracy: 0.0991\n",
      "Epoch: 2/300 Train Loss: 2.1354 Accuracy: 0.2207 Time: 14.26677  | Val Loss: 1.8890 Accuracy: 0.3450\n",
      "Epoch: 3/300 Train Loss: 1.9731 Accuracy: 0.3015 Time: 14.30375  | Val Loss: 1.7422 Accuracy: 0.4166\n",
      "Epoch: 4/300 Train Loss: 1.8482 Accuracy: 0.3719 Time: 14.00262  | Val Loss: 1.6566 Accuracy: 0.4588\n",
      "Epoch: 5/300 Train Loss: 1.7520 Accuracy: 0.4175 Time: 14.29534  | Val Loss: 1.5586 Accuracy: 0.5030\n",
      "Epoch: 6/300 Train Loss: 1.6723 Accuracy: 0.4590 Time: 13.70866  | Val Loss: 1.4725 Accuracy: 0.5522\n",
      "Epoch: 7/300 Train Loss: 1.6178 Accuracy: 0.4876 Time: 14.03620  | Val Loss: 1.4252 Accuracy: 0.5828\n",
      "Epoch: 8/300 Train Loss: 1.5684 Accuracy: 0.5099 Time: 13.70768  | Val Loss: 1.3886 Accuracy: 0.5904\n",
      "Epoch: 9/300 Train Loss: 1.5381 Accuracy: 0.5255 Time: 14.26017  | Val Loss: 1.3929 Accuracy: 0.6039\n",
      "Epoch: 10/300 Train Loss: 1.5166 Accuracy: 0.5395 Time: 14.16402  | Val Loss: 1.3417 Accuracy: 0.6079\n",
      "Epoch: 11/300 Train Loss: 1.4853 Accuracy: 0.5488 Time: 14.02229  | Val Loss: 1.3164 Accuracy: 0.6331\n",
      "Epoch: 12/300 Train Loss: 1.4618 Accuracy: 0.5657 Time: 14.13935  | Val Loss: 1.2755 Accuracy: 0.6501\n",
      "Epoch: 13/300 Train Loss: 1.4307 Accuracy: 0.5760 Time: 14.44480  | Val Loss: 1.2843 Accuracy: 0.6511\n",
      "Epoch: 14/300 Train Loss: 1.4072 Accuracy: 0.5922 Time: 13.91981  | Val Loss: 1.2127 Accuracy: 0.6803\n",
      "Epoch: 15/300 Train Loss: 1.3780 Accuracy: 0.6021 Time: 14.32751  | Val Loss: 1.1960 Accuracy: 0.6933\n",
      "Epoch: 16/300 Train Loss: 1.3572 Accuracy: 0.6126 Time: 13.68790  | Val Loss: 1.2127 Accuracy: 0.6788\n",
      "Epoch: 17/300 Train Loss: 1.3251 Accuracy: 0.6288 Time: 13.77584  | Val Loss: 1.1579 Accuracy: 0.7102\n",
      "Epoch: 18/300 Train Loss: 1.2994 Accuracy: 0.6427 Time: 14.38644  | Val Loss: 1.1487 Accuracy: 0.7112\n",
      "Epoch: 19/300 Train Loss: 1.2719 Accuracy: 0.6537 Time: 14.13813  | Val Loss: 1.1146 Accuracy: 0.7285\n",
      "Epoch: 20/300 Train Loss: 1.2488 Accuracy: 0.6657 Time: 13.89032  | Val Loss: 1.0858 Accuracy: 0.7423\n",
      "Epoch: 21/300 Train Loss: 1.2346 Accuracy: 0.6703 Time: 14.16767  | Val Loss: 1.0613 Accuracy: 0.7537\n",
      "Epoch: 22/300 Train Loss: 1.2186 Accuracy: 0.6795 Time: 14.43965  | Val Loss: 1.0560 Accuracy: 0.7519\n",
      "Epoch: 23/300 Train Loss: 1.1991 Accuracy: 0.6882 Time: 13.87423  | Val Loss: 1.0546 Accuracy: 0.7546\n",
      "Epoch: 24/300 Train Loss: 1.1855 Accuracy: 0.6950 Time: 13.98468  | Val Loss: 1.0565 Accuracy: 0.7504\n",
      "Epoch: 25/300 Train Loss: 1.1759 Accuracy: 0.6987 Time: 13.72566  | Val Loss: 1.0604 Accuracy: 0.7594\n",
      "Epoch: 26/300 Train Loss: 1.1551 Accuracy: 0.7073 Time: 13.57836  | Val Loss: 1.0137 Accuracy: 0.7755\n",
      "Epoch: 27/300 Train Loss: 1.1419 Accuracy: 0.7152 Time: 13.94750  | Val Loss: 0.9948 Accuracy: 0.7803\n",
      "Epoch: 28/300 Train Loss: 1.1321 Accuracy: 0.7199 Time: 14.13499  | Val Loss: 1.0098 Accuracy: 0.7780\n",
      "Epoch: 29/300 Train Loss: 1.1205 Accuracy: 0.7226 Time: 14.36171  | Val Loss: 0.9775 Accuracy: 0.7902\n",
      "Epoch: 30/300 Train Loss: 1.1079 Accuracy: 0.7287 Time: 13.57354  | Val Loss: 0.9929 Accuracy: 0.7816\n",
      "Epoch: 31/300 Train Loss: 1.0997 Accuracy: 0.7351 Time: 13.43005  | Val Loss: 0.9605 Accuracy: 0.7997\n",
      "Epoch: 32/300 Train Loss: 1.0902 Accuracy: 0.7369 Time: 13.93018  | Val Loss: 0.9532 Accuracy: 0.8005\n",
      "Epoch: 33/300 Train Loss: 1.0762 Accuracy: 0.7438 Time: 13.67540  | Val Loss: 0.9455 Accuracy: 0.8065\n",
      "Epoch: 34/300 Train Loss: 1.0640 Accuracy: 0.7493 Time: 13.67957  | Val Loss: 0.9477 Accuracy: 0.8086\n",
      "Epoch: 35/300 Train Loss: 1.0627 Accuracy: 0.7477 Time: 13.16900  | Val Loss: 0.9130 Accuracy: 0.8209\n",
      "Epoch: 36/300 Train Loss: 1.0521 Accuracy: 0.7547 Time: 13.17426  | Val Loss: 0.9321 Accuracy: 0.8088\n",
      "Epoch: 37/300 Train Loss: 1.0418 Accuracy: 0.7578 Time: 13.55252  | Val Loss: 0.9330 Accuracy: 0.8106\n",
      "Epoch: 38/300 Train Loss: 1.0388 Accuracy: 0.7609 Time: 13.05718  | Val Loss: 0.9138 Accuracy: 0.8195\n",
      "Epoch: 39/300 Train Loss: 1.0298 Accuracy: 0.7655 Time: 13.55937  | Val Loss: 0.9041 Accuracy: 0.8226\n",
      "Epoch: 40/300 Train Loss: 1.0254 Accuracy: 0.7645 Time: 13.29861  | Val Loss: 0.9085 Accuracy: 0.8220\n",
      "Epoch: 41/300 Train Loss: 1.0116 Accuracy: 0.7724 Time: 13.15357  | Val Loss: 0.9133 Accuracy: 0.8137\n",
      "Epoch: 42/300 Train Loss: 1.0092 Accuracy: 0.7735 Time: 13.07827  | Val Loss: 0.8926 Accuracy: 0.8286\n",
      "Epoch: 43/300 Train Loss: 0.9977 Accuracy: 0.7797 Time: 13.59933  | Val Loss: 0.8902 Accuracy: 0.8314\n",
      "Epoch: 44/300 Train Loss: 0.9906 Accuracy: 0.7828 Time: 13.37349  | Val Loss: 0.8916 Accuracy: 0.8306\n",
      "Epoch: 45/300 Train Loss: 0.9889 Accuracy: 0.7822 Time: 13.16003  | Val Loss: 0.8738 Accuracy: 0.8360\n",
      "Epoch: 46/300 Train Loss: 0.9775 Accuracy: 0.7870 Time: 13.32057  | Val Loss: 0.8655 Accuracy: 0.8427\n",
      "Epoch: 47/300 Train Loss: 0.9700 Accuracy: 0.7912 Time: 13.19037  | Val Loss: 0.8696 Accuracy: 0.8379\n",
      "Epoch: 48/300 Train Loss: 0.9714 Accuracy: 0.7901 Time: 13.35031  | Val Loss: 0.8477 Accuracy: 0.8439\n",
      "Epoch: 49/300 Train Loss: 0.9590 Accuracy: 0.7952 Time: 13.49974  | Val Loss: 0.8510 Accuracy: 0.8456\n",
      "Epoch: 50/300 Train Loss: 0.9556 Accuracy: 0.7965 Time: 14.30534  | Val Loss: 0.8418 Accuracy: 0.8487\n",
      "Epoch: 51/300 Train Loss: 0.9472 Accuracy: 0.8001 Time: 13.78988  | Val Loss: 0.8573 Accuracy: 0.8420\n",
      "Epoch: 52/300 Train Loss: 0.9393 Accuracy: 0.8038 Time: 14.02513  | Val Loss: 0.8325 Accuracy: 0.8539\n",
      "Epoch: 53/300 Train Loss: 0.9324 Accuracy: 0.8076 Time: 13.78023  | Val Loss: 0.8485 Accuracy: 0.8475\n",
      "Epoch: 54/300 Train Loss: 0.9326 Accuracy: 0.8066 Time: 13.17757  | Val Loss: 0.8263 Accuracy: 0.8562\n",
      "Epoch: 55/300 Train Loss: 0.9280 Accuracy: 0.8073 Time: 13.39226  | Val Loss: 0.8465 Accuracy: 0.8480\n",
      "Epoch: 56/300 Train Loss: 0.9181 Accuracy: 0.8145 Time: 13.72322  | Val Loss: 0.8310 Accuracy: 0.8548\n",
      "Epoch: 57/300 Train Loss: 0.9162 Accuracy: 0.8130 Time: 13.72952  | Val Loss: 0.8224 Accuracy: 0.8607\n",
      "Epoch: 58/300 Train Loss: 0.9120 Accuracy: 0.8153 Time: 13.64937  | Val Loss: 0.8335 Accuracy: 0.8487\n",
      "Epoch: 59/300 Train Loss: 0.9069 Accuracy: 0.8190 Time: 13.96667  | Val Loss: 0.8282 Accuracy: 0.8552\n",
      "Epoch: 60/300 Train Loss: 0.8971 Accuracy: 0.8242 Time: 13.60233  | Val Loss: 0.8072 Accuracy: 0.8673\n",
      "Epoch: 61/300 Train Loss: 0.8954 Accuracy: 0.8235 Time: 13.95503  | Val Loss: 0.8155 Accuracy: 0.8623\n",
      "Epoch: 62/300 Train Loss: 0.8949 Accuracy: 0.8227 Time: 14.05922  | Val Loss: 0.8096 Accuracy: 0.8635\n",
      "Epoch: 63/300 Train Loss: 0.8843 Accuracy: 0.8288 Time: 13.73568  | Val Loss: 0.8063 Accuracy: 0.8692\n",
      "Epoch: 64/300 Train Loss: 0.8886 Accuracy: 0.8258 Time: 13.60310  | Val Loss: 0.8092 Accuracy: 0.8663\n",
      "Epoch: 65/300 Train Loss: 0.8778 Accuracy: 0.8304 Time: 14.10703  | Val Loss: 0.7980 Accuracy: 0.8692\n",
      "Epoch: 66/300 Train Loss: 0.8758 Accuracy: 0.8329 Time: 13.92482  | Val Loss: 0.7915 Accuracy: 0.8702\n",
      "Epoch: 67/300 Train Loss: 0.8737 Accuracy: 0.8324 Time: 13.91819  | Val Loss: 0.7959 Accuracy: 0.8686\n",
      "Epoch: 68/300 Train Loss: 0.8692 Accuracy: 0.8358 Time: 13.60657  | Val Loss: 0.8075 Accuracy: 0.8664\n",
      "Epoch: 69/300 Train Loss: 0.8701 Accuracy: 0.8348 Time: 14.18263  | Val Loss: 0.8044 Accuracy: 0.8658\n",
      "Epoch: 70/300 Train Loss: 0.8583 Accuracy: 0.8409 Time: 13.93749  | Val Loss: 0.7819 Accuracy: 0.8741\n",
      "Epoch: 71/300 Train Loss: 0.8585 Accuracy: 0.8415 Time: 14.34504  | Val Loss: 0.8044 Accuracy: 0.8672\n",
      "Epoch: 72/300 Train Loss: 0.8512 Accuracy: 0.8427 Time: 13.84881  | Val Loss: 0.7926 Accuracy: 0.8739\n",
      "Epoch: 73/300 Train Loss: 0.8544 Accuracy: 0.8406 Time: 14.26676  | Val Loss: 0.7640 Accuracy: 0.8855\n",
      "Epoch: 74/300 Train Loss: 0.8460 Accuracy: 0.8464 Time: 13.94975  | Val Loss: 0.7848 Accuracy: 0.8760\n",
      "Epoch: 75/300 Train Loss: 0.8428 Accuracy: 0.8463 Time: 13.70866  | Val Loss: 0.7783 Accuracy: 0.8798\n",
      "Epoch: 76/300 Train Loss: 0.8451 Accuracy: 0.8468 Time: 14.02384  | Val Loss: 0.7775 Accuracy: 0.8788\n",
      "Epoch: 77/300 Train Loss: 0.8344 Accuracy: 0.8504 Time: 13.87795  | Val Loss: 0.7857 Accuracy: 0.8729\n",
      "Epoch: 78/300 Train Loss: 0.8338 Accuracy: 0.8503 Time: 13.81580  | Val Loss: 0.7713 Accuracy: 0.8840\n",
      "Epoch: 79/300 Train Loss: 0.8288 Accuracy: 0.8537 Time: 13.92816  | Val Loss: 0.7649 Accuracy: 0.8836\n",
      "Epoch: 80/300 Train Loss: 0.8244 Accuracy: 0.8548 Time: 13.29062  | Val Loss: 0.7811 Accuracy: 0.8794\n",
      "Epoch: 81/300 Train Loss: 0.8264 Accuracy: 0.8554 Time: 13.81644  | Val Loss: 0.7709 Accuracy: 0.8803\n",
      "Epoch: 82/300 Train Loss: 0.8263 Accuracy: 0.8539 Time: 13.19018  | Val Loss: 0.7762 Accuracy: 0.8839\n",
      "Epoch: 83/300 Train Loss: 0.8174 Accuracy: 0.8584 Time: 13.20746  | Val Loss: 0.7791 Accuracy: 0.8791\n",
      "Epoch: 84/300 Train Loss: 0.8180 Accuracy: 0.8590 Time: 13.67189  | Val Loss: 0.7711 Accuracy: 0.8787\n",
      "Epoch: 85/300 Train Loss: 0.8113 Accuracy: 0.8615 Time: 13.41450  | Val Loss: 0.7693 Accuracy: 0.8834\n",
      "Epoch: 86/300 Train Loss: 0.8104 Accuracy: 0.8608 Time: 13.57812  | Val Loss: 0.7565 Accuracy: 0.8849\n",
      "Epoch: 87/300 Train Loss: 0.8085 Accuracy: 0.8624 Time: 13.69870  | Val Loss: 0.7593 Accuracy: 0.8896\n",
      "Epoch: 88/300 Train Loss: 0.8051 Accuracy: 0.8652 Time: 13.49453  | Val Loss: 0.7649 Accuracy: 0.8898\n",
      "Epoch: 89/300 Train Loss: 0.7992 Accuracy: 0.8669 Time: 13.38609  | Val Loss: 0.7542 Accuracy: 0.8915\n",
      "Epoch: 90/300 Train Loss: 0.8071 Accuracy: 0.8637 Time: 13.33387  | Val Loss: 0.7688 Accuracy: 0.8838\n",
      "Epoch: 91/300 Train Loss: 0.7961 Accuracy: 0.8673 Time: 13.59062  | Val Loss: 0.7622 Accuracy: 0.8867\n",
      "Epoch: 92/300 Train Loss: 0.7948 Accuracy: 0.8704 Time: 13.29877  | Val Loss: 0.7634 Accuracy: 0.8829\n",
      "Epoch: 93/300 Train Loss: 0.7885 Accuracy: 0.8695 Time: 14.14190  | Val Loss: 0.7579 Accuracy: 0.8870\n",
      "Epoch: 94/300 Train Loss: 0.7916 Accuracy: 0.8698 Time: 14.26117  | Val Loss: 0.7687 Accuracy: 0.8811\n",
      "Epoch: 95/300 Train Loss: 0.7852 Accuracy: 0.8721 Time: 13.58914  | Val Loss: 0.7543 Accuracy: 0.8933\n",
      "Epoch: 96/300 Train Loss: 0.7817 Accuracy: 0.8734 Time: 13.94493  | Val Loss: 0.7622 Accuracy: 0.8873\n",
      "Epoch: 97/300 Train Loss: 0.7803 Accuracy: 0.8746 Time: 13.75356  | Val Loss: 0.7650 Accuracy: 0.8853\n",
      "Epoch: 98/300 Train Loss: 0.7802 Accuracy: 0.8763 Time: 13.74465  | Val Loss: 0.7594 Accuracy: 0.8874\n",
      "Epoch: 99/300 Train Loss: 0.7773 Accuracy: 0.8761 Time: 14.07606  | Val Loss: 0.7406 Accuracy: 0.8958\n",
      "Epoch: 100/300 Train Loss: 0.7734 Accuracy: 0.8775 Time: 13.36972  | Val Loss: 0.7499 Accuracy: 0.8926\n",
      "Epoch: 101/300 Train Loss: 0.7694 Accuracy: 0.8794 Time: 13.58846  | Val Loss: 0.7638 Accuracy: 0.8825\n",
      "Epoch: 102/300 Train Loss: 0.7697 Accuracy: 0.8797 Time: 13.37177  | Val Loss: 0.7586 Accuracy: 0.8893\n",
      "Epoch: 103/300 Train Loss: 0.7674 Accuracy: 0.8808 Time: 13.38359  | Val Loss: 0.7510 Accuracy: 0.8944\n",
      "Epoch: 104/300 Train Loss: 0.7646 Accuracy: 0.8827 Time: 13.16356  | Val Loss: 0.7528 Accuracy: 0.8919\n",
      "Epoch: 105/300 Train Loss: 0.7650 Accuracy: 0.8820 Time: 13.43050  | Val Loss: 0.7494 Accuracy: 0.8950\n",
      "Epoch: 106/300 Train Loss: 0.7586 Accuracy: 0.8854 Time: 13.45952  | Val Loss: 0.7537 Accuracy: 0.8921\n",
      "Epoch: 107/300 Train Loss: 0.7601 Accuracy: 0.8843 Time: 13.47775  | Val Loss: 0.7514 Accuracy: 0.8932\n",
      "Epoch: 108/300 Train Loss: 0.7539 Accuracy: 0.8861 Time: 13.34745  | Val Loss: 0.7500 Accuracy: 0.8944\n",
      "Epoch: 109/300 Train Loss: 0.7539 Accuracy: 0.8867 Time: 13.50551  | Val Loss: 0.7401 Accuracy: 0.8953\n",
      "Epoch: 110/300 Train Loss: 0.7523 Accuracy: 0.8864 Time: 14.07139  | Val Loss: 0.7413 Accuracy: 0.8952\n",
      "Epoch: 111/300 Train Loss: 0.7490 Accuracy: 0.8883 Time: 13.20591  | Val Loss: 0.7585 Accuracy: 0.8902\n",
      "Epoch: 112/300 Train Loss: 0.7463 Accuracy: 0.8900 Time: 13.32334  | Val Loss: 0.7563 Accuracy: 0.8913\n",
      "Epoch: 113/300 Train Loss: 0.7424 Accuracy: 0.8926 Time: 13.35374  | Val Loss: 0.7397 Accuracy: 0.8995\n",
      "Epoch: 114/300 Train Loss: 0.7436 Accuracy: 0.8912 Time: 13.24305  | Val Loss: 0.7330 Accuracy: 0.9012\n",
      "Epoch: 115/300 Train Loss: 0.7423 Accuracy: 0.8920 Time: 13.28759  | Val Loss: 0.7423 Accuracy: 0.8999\n",
      "Epoch: 116/300 Train Loss: 0.7396 Accuracy: 0.8922 Time: 13.06621  | Val Loss: 0.7646 Accuracy: 0.8875\n",
      "Epoch: 117/300 Train Loss: 0.7410 Accuracy: 0.8918 Time: 13.16198  | Val Loss: 0.7509 Accuracy: 0.8954\n",
      "Epoch: 118/300 Train Loss: 0.7334 Accuracy: 0.8960 Time: 13.30639  | Val Loss: 0.7379 Accuracy: 0.8959\n",
      "Epoch: 119/300 Train Loss: 0.7327 Accuracy: 0.8965 Time: 13.40249  | Val Loss: 0.7380 Accuracy: 0.8960\n",
      "Epoch: 120/300 Train Loss: 0.7307 Accuracy: 0.8977 Time: 13.14490  | Val Loss: 0.7407 Accuracy: 0.8981\n",
      "Epoch: 121/300 Train Loss: 0.7303 Accuracy: 0.8972 Time: 13.03278  | Val Loss: 0.7382 Accuracy: 0.8972\n",
      "Epoch: 122/300 Train Loss: 0.7284 Accuracy: 0.8985 Time: 13.15757  | Val Loss: 0.7391 Accuracy: 0.9001\n",
      "Epoch: 123/300 Train Loss: 0.7235 Accuracy: 0.9001 Time: 13.59022  | Val Loss: 0.7318 Accuracy: 0.8980\n",
      "Epoch: 124/300 Train Loss: 0.7208 Accuracy: 0.9018 Time: 13.16470  | Val Loss: 0.7359 Accuracy: 0.9020\n",
      "Epoch: 125/300 Train Loss: 0.7212 Accuracy: 0.9013 Time: 13.38783  | Val Loss: 0.7369 Accuracy: 0.8993\n",
      "Epoch: 126/300 Train Loss: 0.7237 Accuracy: 0.9004 Time: 13.36880  | Val Loss: 0.7343 Accuracy: 0.9032\n",
      "Epoch: 127/300 Train Loss: 0.7140 Accuracy: 0.9039 Time: 13.38544  | Val Loss: 0.7310 Accuracy: 0.9041\n",
      "Epoch: 128/300 Train Loss: 0.7182 Accuracy: 0.9029 Time: 13.06075  | Val Loss: 0.7347 Accuracy: 0.9013\n",
      "Epoch: 129/300 Train Loss: 0.7142 Accuracy: 0.9046 Time: 13.44322  | Val Loss: 0.7390 Accuracy: 0.8994\n",
      "Epoch: 130/300 Train Loss: 0.7113 Accuracy: 0.9049 Time: 13.54732  | Val Loss: 0.7478 Accuracy: 0.8980\n",
      "Epoch: 131/300 Train Loss: 0.7108 Accuracy: 0.9058 Time: 13.24052  | Val Loss: 0.7538 Accuracy: 0.8969\n",
      "Epoch: 132/300 Train Loss: 0.7130 Accuracy: 0.9045 Time: 13.46790  | Val Loss: 0.7420 Accuracy: 0.9014\n",
      "Epoch: 133/300 Train Loss: 0.7113 Accuracy: 0.9053 Time: 13.64512  | Val Loss: 0.7372 Accuracy: 0.8981\n",
      "Epoch: 134/300 Train Loss: 0.7052 Accuracy: 0.9086 Time: 13.51423  | Val Loss: 0.7374 Accuracy: 0.9000\n",
      "Epoch: 135/300 Train Loss: 0.7043 Accuracy: 0.9086 Time: 13.59033  | Val Loss: 0.7336 Accuracy: 0.9028\n",
      "Epoch: 136/300 Train Loss: 0.7004 Accuracy: 0.9109 Time: 14.04458  | Val Loss: 0.7511 Accuracy: 0.8970\n",
      "Epoch: 137/300 Train Loss: 0.7047 Accuracy: 0.9100 Time: 14.34967  | Val Loss: 0.7446 Accuracy: 0.8960\n",
      "Epoch: 138/300 Train Loss: 0.6987 Accuracy: 0.9111 Time: 14.13623  | Val Loss: 0.7423 Accuracy: 0.8993\n",
      "Epoch: 139/300 Train Loss: 0.6983 Accuracy: 0.9112 Time: 13.75065  | Val Loss: 0.7487 Accuracy: 0.8982\n",
      "Epoch: 140/300 Train Loss: 0.6925 Accuracy: 0.9143 Time: 13.67028  | Val Loss: 0.7285 Accuracy: 0.9040\n",
      "Epoch: 141/300 Train Loss: 0.6952 Accuracy: 0.9127 Time: 14.06238  | Val Loss: 0.7456 Accuracy: 0.8969\n",
      "Epoch: 142/300 Train Loss: 0.6952 Accuracy: 0.9138 Time: 13.56801  | Val Loss: 0.7259 Accuracy: 0.9072\n",
      "Epoch: 143/300 Train Loss: 0.6919 Accuracy: 0.9152 Time: 14.01412  | Val Loss: 0.7210 Accuracy: 0.9082\n",
      "Epoch: 144/300 Train Loss: 0.6875 Accuracy: 0.9170 Time: 13.60550  | Val Loss: 0.7423 Accuracy: 0.9018\n",
      "Epoch: 145/300 Train Loss: 0.6885 Accuracy: 0.9157 Time: 13.75953  | Val Loss: 0.7280 Accuracy: 0.9065\n",
      "Epoch: 146/300 Train Loss: 0.6850 Accuracy: 0.9183 Time: 13.63566  | Val Loss: 0.7292 Accuracy: 0.9044\n",
      "Epoch: 147/300 Train Loss: 0.6844 Accuracy: 0.9169 Time: 14.27502  | Val Loss: 0.7265 Accuracy: 0.9073\n",
      "Epoch: 148/300 Train Loss: 0.6886 Accuracy: 0.9154 Time: 14.07831  | Val Loss: 0.7370 Accuracy: 0.9035\n",
      "Epoch: 149/300 Train Loss: 0.6817 Accuracy: 0.9185 Time: 13.87581  | Val Loss: 0.7312 Accuracy: 0.9057\n",
      "Epoch: 150/300 Train Loss: 0.6802 Accuracy: 0.9203 Time: 13.75069  | Val Loss: 0.7269 Accuracy: 0.9098\n",
      "Epoch: 151/300 Train Loss: 0.6843 Accuracy: 0.9181 Time: 13.49077  | Val Loss: 0.7435 Accuracy: 0.9011\n",
      "Epoch: 152/300 Train Loss: 0.6804 Accuracy: 0.9199 Time: 14.32327  | Val Loss: 0.7416 Accuracy: 0.9009\n",
      "Epoch: 153/300 Train Loss: 0.6744 Accuracy: 0.9227 Time: 14.05048  | Val Loss: 0.7230 Accuracy: 0.9094\n",
      "Epoch: 154/300 Train Loss: 0.6788 Accuracy: 0.9200 Time: 14.06056  | Val Loss: 0.7373 Accuracy: 0.9042\n",
      "Epoch: 155/300 Train Loss: 0.6748 Accuracy: 0.9223 Time: 14.57757  | Val Loss: 0.7250 Accuracy: 0.9066\n",
      "Epoch: 156/300 Train Loss: 0.6759 Accuracy: 0.9223 Time: 13.82748  | Val Loss: 0.7406 Accuracy: 0.9017\n",
      "Epoch: 157/300 Train Loss: 0.6750 Accuracy: 0.9221 Time: 14.01410  | Val Loss: 0.7252 Accuracy: 0.9080\n",
      "Epoch: 158/300 Train Loss: 0.6701 Accuracy: 0.9234 Time: 14.08196  | Val Loss: 0.7303 Accuracy: 0.9061\n",
      "Epoch: 159/300 Train Loss: 0.6737 Accuracy: 0.9230 Time: 13.98976  | Val Loss: 0.7278 Accuracy: 0.9081\n",
      "Epoch: 160/300 Train Loss: 0.6664 Accuracy: 0.9267 Time: 13.94595  | Val Loss: 0.7283 Accuracy: 0.9072\n",
      "Epoch: 161/300 Train Loss: 0.6671 Accuracy: 0.9253 Time: 13.69533  | Val Loss: 0.7268 Accuracy: 0.9089\n",
      "Epoch: 162/300 Train Loss: 0.6640 Accuracy: 0.9268 Time: 14.26346  | Val Loss: 0.7323 Accuracy: 0.9032\n",
      "Epoch: 163/300 Train Loss: 0.6620 Accuracy: 0.9278 Time: 14.20845  | Val Loss: 0.7235 Accuracy: 0.9101\n",
      "Epoch: 164/300 Train Loss: 0.6645 Accuracy: 0.9269 Time: 14.61488  | Val Loss: 0.7296 Accuracy: 0.9096\n",
      "Epoch: 165/300 Train Loss: 0.6627 Accuracy: 0.9275 Time: 13.84256  | Val Loss: 0.7271 Accuracy: 0.9062\n",
      "Epoch: 166/300 Train Loss: 0.6629 Accuracy: 0.9277 Time: 13.65221  | Val Loss: 0.7295 Accuracy: 0.9079\n",
      "Epoch: 167/300 Train Loss: 0.6622 Accuracy: 0.9262 Time: 13.54114  | Val Loss: 0.7313 Accuracy: 0.9046\n",
      "Epoch: 168/300 Train Loss: 0.6547 Accuracy: 0.9313 Time: 13.94757  | Val Loss: 0.7223 Accuracy: 0.9115\n",
      "Epoch: 169/300 Train Loss: 0.6551 Accuracy: 0.9304 Time: 13.45172  | Val Loss: 0.7295 Accuracy: 0.9087\n",
      "Epoch: 170/300 Train Loss: 0.6524 Accuracy: 0.9319 Time: 13.34116  | Val Loss: 0.7325 Accuracy: 0.9083\n",
      "Epoch: 171/300 Train Loss: 0.6543 Accuracy: 0.9317 Time: 13.57558  | Val Loss: 0.7163 Accuracy: 0.9136\n",
      "Epoch: 172/300 Train Loss: 0.6510 Accuracy: 0.9334 Time: 13.45524  | Val Loss: 0.7270 Accuracy: 0.9091\n",
      "Epoch: 173/300 Train Loss: 0.6486 Accuracy: 0.9344 Time: 13.90594  | Val Loss: 0.7307 Accuracy: 0.9084\n",
      "Epoch: 174/300 Train Loss: 0.6510 Accuracy: 0.9324 Time: 13.72629  | Val Loss: 0.7392 Accuracy: 0.9055\n",
      "Epoch: 175/300 Train Loss: 0.6499 Accuracy: 0.9330 Time: 13.41289  | Val Loss: 0.7374 Accuracy: 0.9054\n",
      "Epoch: 176/300 Train Loss: 0.6485 Accuracy: 0.9340 Time: 13.42250  | Val Loss: 0.7241 Accuracy: 0.9123\n",
      "Epoch: 177/300 Train Loss: 0.6456 Accuracy: 0.9355 Time: 13.50971  | Val Loss: 0.7288 Accuracy: 0.9102\n",
      "Epoch: 178/300 Train Loss: 0.6509 Accuracy: 0.9333 Time: 13.34618  | Val Loss: 0.7233 Accuracy: 0.9116\n",
      "Epoch: 179/300 Train Loss: 0.6457 Accuracy: 0.9340 Time: 13.47505  | Val Loss: 0.7183 Accuracy: 0.9113\n",
      "Epoch: 180/300 Train Loss: 0.6456 Accuracy: 0.9353 Time: 13.53847  | Val Loss: 0.7249 Accuracy: 0.9102\n",
      "Epoch: 181/300 Train Loss: 0.6457 Accuracy: 0.9351 Time: 13.55698  | Val Loss: 0.7187 Accuracy: 0.9135\n",
      "Epoch: 182/300 Train Loss: 0.6429 Accuracy: 0.9362 Time: 13.55779  | Val Loss: 0.7258 Accuracy: 0.9103\n",
      "Epoch: 183/300 Train Loss: 0.6394 Accuracy: 0.9381 Time: 13.56002  | Val Loss: 0.7341 Accuracy: 0.9079\n",
      "Epoch: 184/300 Train Loss: 0.6412 Accuracy: 0.9377 Time: 13.58151  | Val Loss: 0.7290 Accuracy: 0.9107\n",
      "Epoch: 185/300 Train Loss: 0.6374 Accuracy: 0.9389 Time: 13.54911  | Val Loss: 0.7197 Accuracy: 0.9134\n",
      "Epoch: 186/300 Train Loss: 0.6401 Accuracy: 0.9367 Time: 13.94852  | Val Loss: 0.7152 Accuracy: 0.9136\n",
      "Epoch: 187/300 Train Loss: 0.6368 Accuracy: 0.9403 Time: 13.43848  | Val Loss: 0.7206 Accuracy: 0.9126\n",
      "Epoch: 188/300 Train Loss: 0.6366 Accuracy: 0.9394 Time: 13.24244  | Val Loss: 0.7285 Accuracy: 0.9101\n",
      "Epoch: 189/300 Train Loss: 0.6359 Accuracy: 0.9398 Time: 13.66840  | Val Loss: 0.7275 Accuracy: 0.9121\n",
      "Epoch: 190/300 Train Loss: 0.6382 Accuracy: 0.9377 Time: 13.36735  | Val Loss: 0.7222 Accuracy: 0.9116\n",
      "Epoch: 191/300 Train Loss: 0.6339 Accuracy: 0.9407 Time: 13.33938  | Val Loss: 0.7218 Accuracy: 0.9138\n",
      "Epoch: 192/300 Train Loss: 0.6329 Accuracy: 0.9406 Time: 13.73542  | Val Loss: 0.7180 Accuracy: 0.9144\n",
      "Epoch: 193/300 Train Loss: 0.6294 Accuracy: 0.9435 Time: 13.33696  | Val Loss: 0.7292 Accuracy: 0.9126\n",
      "Epoch: 194/300 Train Loss: 0.6279 Accuracy: 0.9433 Time: 13.33704  | Val Loss: 0.7233 Accuracy: 0.9148\n",
      "Epoch: 195/300 Train Loss: 0.6301 Accuracy: 0.9416 Time: 13.31497  | Val Loss: 0.7271 Accuracy: 0.9125\n",
      "Epoch: 196/300 Train Loss: 0.6265 Accuracy: 0.9437 Time: 13.62743  | Val Loss: 0.7181 Accuracy: 0.9167\n",
      "Epoch: 197/300 Train Loss: 0.6236 Accuracy: 0.9457 Time: 13.55388  | Val Loss: 0.7293 Accuracy: 0.9137\n",
      "Epoch: 198/300 Train Loss: 0.6244 Accuracy: 0.9455 Time: 13.39651  | Val Loss: 0.7296 Accuracy: 0.9129\n",
      "Epoch: 199/300 Train Loss: 0.6255 Accuracy: 0.9445 Time: 13.72369  | Val Loss: 0.7246 Accuracy: 0.9128\n",
      "Epoch: 200/300 Train Loss: 0.6226 Accuracy: 0.9453 Time: 14.20419  | Val Loss: 0.7242 Accuracy: 0.9134\n",
      "Epoch: 201/300 Train Loss: 0.6271 Accuracy: 0.9427 Time: 13.64213  | Val Loss: 0.7155 Accuracy: 0.9133\n",
      "Epoch: 202/300 Train Loss: 0.6210 Accuracy: 0.9472 Time: 13.60168  | Val Loss: 0.7384 Accuracy: 0.9092\n",
      "Epoch: 203/300 Train Loss: 0.6215 Accuracy: 0.9461 Time: 14.00882  | Val Loss: 0.7156 Accuracy: 0.9148\n",
      "Epoch: 204/300 Train Loss: 0.6198 Accuracy: 0.9470 Time: 13.91251  | Val Loss: 0.7233 Accuracy: 0.9130\n",
      "Epoch: 205/300 Train Loss: 0.6185 Accuracy: 0.9471 Time: 13.69313  | Val Loss: 0.7197 Accuracy: 0.9152\n",
      "Epoch: 206/300 Train Loss: 0.6203 Accuracy: 0.9464 Time: 13.50718  | Val Loss: 0.7323 Accuracy: 0.9115\n",
      "Epoch: 207/300 Train Loss: 0.6196 Accuracy: 0.9466 Time: 13.44973  | Val Loss: 0.7219 Accuracy: 0.9159\n",
      "Epoch: 208/300 Train Loss: 0.6183 Accuracy: 0.9470 Time: 14.23769  | Val Loss: 0.7192 Accuracy: 0.9159\n",
      "Epoch: 209/300 Train Loss: 0.6151 Accuracy: 0.9485 Time: 14.06270  | Val Loss: 0.7291 Accuracy: 0.9125\n",
      "Epoch: 210/300 Train Loss: 0.6144 Accuracy: 0.9492 Time: 14.01120  | Val Loss: 0.7257 Accuracy: 0.9161\n",
      "Epoch: 211/300 Train Loss: 0.6125 Accuracy: 0.9496 Time: 14.65887  | Val Loss: 0.7091 Accuracy: 0.9168\n",
      "Epoch: 212/300 Train Loss: 0.6140 Accuracy: 0.9490 Time: 14.27821  | Val Loss: 0.7114 Accuracy: 0.9187\n",
      "Epoch: 213/300 Train Loss: 0.6139 Accuracy: 0.9497 Time: 13.77451  | Val Loss: 0.7177 Accuracy: 0.9154\n",
      "Epoch: 214/300 Train Loss: 0.6121 Accuracy: 0.9503 Time: 13.89808  | Val Loss: 0.7249 Accuracy: 0.9145\n",
      "Epoch: 215/300 Train Loss: 0.6123 Accuracy: 0.9494 Time: 14.30827  | Val Loss: 0.7212 Accuracy: 0.9174\n",
      "Epoch: 216/300 Train Loss: 0.6106 Accuracy: 0.9511 Time: 14.15092  | Val Loss: 0.7235 Accuracy: 0.9175\n",
      "Epoch: 217/300 Train Loss: 0.6117 Accuracy: 0.9506 Time: 14.60835  | Val Loss: 0.7240 Accuracy: 0.9159\n",
      "Epoch: 218/300 Train Loss: 0.6090 Accuracy: 0.9519 Time: 14.89023  | Val Loss: 0.7117 Accuracy: 0.9188\n",
      "Epoch: 219/300 Train Loss: 0.6080 Accuracy: 0.9520 Time: 13.84978  | Val Loss: 0.7169 Accuracy: 0.9182\n",
      "Epoch: 220/300 Train Loss: 0.6088 Accuracy: 0.9519 Time: 13.39152  | Val Loss: 0.7227 Accuracy: 0.9153\n",
      "Epoch: 221/300 Train Loss: 0.6059 Accuracy: 0.9527 Time: 13.95387  | Val Loss: 0.7170 Accuracy: 0.9169\n",
      "Epoch: 222/300 Train Loss: 0.6061 Accuracy: 0.9530 Time: 13.83268  | Val Loss: 0.7173 Accuracy: 0.9181\n",
      "Epoch: 223/300 Train Loss: 0.6061 Accuracy: 0.9539 Time: 13.50692  | Val Loss: 0.7131 Accuracy: 0.9189\n",
      "Epoch: 224/300 Train Loss: 0.6044 Accuracy: 0.9534 Time: 13.37112  | Val Loss: 0.7138 Accuracy: 0.9179\n",
      "Epoch: 225/300 Train Loss: 0.6022 Accuracy: 0.9549 Time: 13.39857  | Val Loss: 0.7210 Accuracy: 0.9184\n",
      "Epoch: 226/300 Train Loss: 0.6036 Accuracy: 0.9540 Time: 13.51560  | Val Loss: 0.7187 Accuracy: 0.9170\n",
      "Epoch: 227/300 Train Loss: 0.6026 Accuracy: 0.9546 Time: 13.56256  | Val Loss: 0.7186 Accuracy: 0.9170\n",
      "Epoch: 228/300 Train Loss: 0.6012 Accuracy: 0.9546 Time: 13.45771  | Val Loss: 0.7198 Accuracy: 0.9174\n",
      "Epoch: 229/300 Train Loss: 0.5995 Accuracy: 0.9558 Time: 13.39834  | Val Loss: 0.7157 Accuracy: 0.9212\n",
      "Epoch: 230/300 Train Loss: 0.6068 Accuracy: 0.9529 Time: 13.58768  | Val Loss: 0.7145 Accuracy: 0.9182\n",
      "Epoch: 231/300 Train Loss: 0.6038 Accuracy: 0.9540 Time: 13.63558  | Val Loss: 0.7172 Accuracy: 0.9190\n",
      "Epoch: 232/300 Train Loss: 0.6028 Accuracy: 0.9548 Time: 13.92954  | Val Loss: 0.7202 Accuracy: 0.9164\n",
      "Epoch: 233/300 Train Loss: 0.5981 Accuracy: 0.9569 Time: 13.62840  | Val Loss: 0.7146 Accuracy: 0.9192\n",
      "Epoch: 234/300 Train Loss: 0.5954 Accuracy: 0.9578 Time: 13.72862  | Val Loss: 0.7138 Accuracy: 0.9176\n",
      "Epoch: 235/300 Train Loss: 0.5997 Accuracy: 0.9550 Time: 13.75073  | Val Loss: 0.7160 Accuracy: 0.9197\n",
      "Epoch: 236/300 Train Loss: 0.5955 Accuracy: 0.9582 Time: 13.89758  | Val Loss: 0.7165 Accuracy: 0.9189\n",
      "Epoch: 237/300 Train Loss: 0.5958 Accuracy: 0.9579 Time: 13.77162  | Val Loss: 0.7137 Accuracy: 0.9183\n",
      "Epoch: 238/300 Train Loss: 0.5969 Accuracy: 0.9570 Time: 13.50182  | Val Loss: 0.7132 Accuracy: 0.9203\n",
      "Epoch: 239/300 Train Loss: 0.5948 Accuracy: 0.9582 Time: 13.41152  | Val Loss: 0.7127 Accuracy: 0.9195\n",
      "Epoch: 240/300 Train Loss: 0.5984 Accuracy: 0.9560 Time: 13.55439  | Val Loss: 0.7141 Accuracy: 0.9209\n",
      "Epoch: 241/300 Train Loss: 0.5925 Accuracy: 0.9597 Time: 13.47171  | Val Loss: 0.7211 Accuracy: 0.9194\n",
      "Epoch: 242/300 Train Loss: 0.5942 Accuracy: 0.9586 Time: 13.32120  | Val Loss: 0.7216 Accuracy: 0.9169\n",
      "Epoch: 243/300 Train Loss: 0.5972 Accuracy: 0.9561 Time: 13.31323  | Val Loss: 0.7098 Accuracy: 0.9213\n",
      "Epoch: 244/300 Train Loss: 0.5948 Accuracy: 0.9583 Time: 13.44446  | Val Loss: 0.7138 Accuracy: 0.9203\n",
      "Epoch: 245/300 Train Loss: 0.5924 Accuracy: 0.9589 Time: 13.51920  | Val Loss: 0.7150 Accuracy: 0.9203\n",
      "Epoch: 246/300 Train Loss: 0.5909 Accuracy: 0.9596 Time: 13.47467  | Val Loss: 0.7152 Accuracy: 0.9186\n",
      "Epoch: 247/300 Train Loss: 0.5907 Accuracy: 0.9600 Time: 13.38450  | Val Loss: 0.7160 Accuracy: 0.9203\n",
      "Epoch: 248/300 Train Loss: 0.5936 Accuracy: 0.9591 Time: 13.69627  | Val Loss: 0.7137 Accuracy: 0.9212\n",
      "Epoch: 249/300 Train Loss: 0.5906 Accuracy: 0.9594 Time: 13.54070  | Val Loss: 0.7155 Accuracy: 0.9198\n",
      "Epoch: 250/300 Train Loss: 0.5903 Accuracy: 0.9596 Time: 13.60481  | Val Loss: 0.7164 Accuracy: 0.9185\n",
      "Epoch: 251/300 Train Loss: 0.5891 Accuracy: 0.9605 Time: 13.31747  | Val Loss: 0.7142 Accuracy: 0.9187\n",
      "Epoch: 252/300 Train Loss: 0.5897 Accuracy: 0.9597 Time: 13.34344  | Val Loss: 0.7168 Accuracy: 0.9188\n",
      "Epoch: 253/300 Train Loss: 0.5904 Accuracy: 0.9598 Time: 13.73337  | Val Loss: 0.7145 Accuracy: 0.9207\n",
      "Epoch: 254/300 Train Loss: 0.5874 Accuracy: 0.9619 Time: 13.36349  | Val Loss: 0.7150 Accuracy: 0.9201\n",
      "Epoch: 255/300 Train Loss: 0.5858 Accuracy: 0.9626 Time: 13.71797  | Val Loss: 0.7198 Accuracy: 0.9180\n",
      "Epoch: 256/300 Train Loss: 0.5865 Accuracy: 0.9623 Time: 13.70006  | Val Loss: 0.7131 Accuracy: 0.9209\n",
      "Epoch: 257/300 Train Loss: 0.5855 Accuracy: 0.9618 Time: 13.70615  | Val Loss: 0.7149 Accuracy: 0.9195\n",
      "Epoch: 258/300 Train Loss: 0.5878 Accuracy: 0.9605 Time: 13.39917  | Val Loss: 0.7096 Accuracy: 0.9223\n",
      "Epoch: 259/300 Train Loss: 0.5864 Accuracy: 0.9607 Time: 13.67326  | Val Loss: 0.7115 Accuracy: 0.9206\n",
      "Epoch: 260/300 Train Loss: 0.5830 Accuracy: 0.9632 Time: 13.41653  | Val Loss: 0.7114 Accuracy: 0.9220\n",
      "Epoch: 261/300 Train Loss: 0.5863 Accuracy: 0.9618 Time: 13.29412  | Val Loss: 0.7073 Accuracy: 0.9226\n",
      "Epoch: 262/300 Train Loss: 0.5812 Accuracy: 0.9638 Time: 13.63912  | Val Loss: 0.7084 Accuracy: 0.9218\n",
      "Epoch: 263/300 Train Loss: 0.5835 Accuracy: 0.9632 Time: 13.44813  | Val Loss: 0.7069 Accuracy: 0.9218\n",
      "Epoch: 264/300 Train Loss: 0.5822 Accuracy: 0.9639 Time: 13.73800  | Val Loss: 0.7064 Accuracy: 0.9229\n",
      "Epoch: 265/300 Train Loss: 0.5829 Accuracy: 0.9635 Time: 13.50679  | Val Loss: 0.7100 Accuracy: 0.9210\n",
      "Epoch: 266/300 Train Loss: 0.5817 Accuracy: 0.9636 Time: 13.67071  | Val Loss: 0.7094 Accuracy: 0.9215\n",
      "Epoch: 267/300 Train Loss: 0.5839 Accuracy: 0.9623 Time: 13.74879  | Val Loss: 0.7161 Accuracy: 0.9213\n",
      "Epoch: 268/300 Train Loss: 0.5807 Accuracy: 0.9648 Time: 13.95465  | Val Loss: 0.7118 Accuracy: 0.9220\n",
      "Epoch: 269/300 Train Loss: 0.5814 Accuracy: 0.9643 Time: 13.48995  | Val Loss: 0.7098 Accuracy: 0.9235\n",
      "Epoch: 270/300 Train Loss: 0.5804 Accuracy: 0.9651 Time: 13.54035  | Val Loss: 0.7119 Accuracy: 0.9221\n",
      "Epoch: 271/300 Train Loss: 0.5817 Accuracy: 0.9646 Time: 13.58626  | Val Loss: 0.7125 Accuracy: 0.9207\n",
      "Epoch: 272/300 Train Loss: 0.5796 Accuracy: 0.9645 Time: 13.88688  | Val Loss: 0.7103 Accuracy: 0.9238\n",
      "Epoch: 273/300 Train Loss: 0.5787 Accuracy: 0.9648 Time: 13.80131  | Val Loss: 0.7106 Accuracy: 0.9213\n",
      "Epoch: 274/300 Train Loss: 0.5803 Accuracy: 0.9646 Time: 13.29750  | Val Loss: 0.7116 Accuracy: 0.9224\n",
      "Epoch: 275/300 Train Loss: 0.5818 Accuracy: 0.9639 Time: 13.43455  | Val Loss: 0.7107 Accuracy: 0.9216\n",
      "Epoch: 276/300 Train Loss: 0.5788 Accuracy: 0.9652 Time: 13.24404  | Val Loss: 0.7107 Accuracy: 0.9221\n",
      "Epoch: 277/300 Train Loss: 0.5758 Accuracy: 0.9655 Time: 13.63784  | Val Loss: 0.7108 Accuracy: 0.9223\n",
      "Epoch: 278/300 Train Loss: 0.5779 Accuracy: 0.9650 Time: 13.42050  | Val Loss: 0.7059 Accuracy: 0.9236\n",
      "Epoch: 279/300 Train Loss: 0.5764 Accuracy: 0.9659 Time: 13.60272  | Val Loss: 0.7084 Accuracy: 0.9242\n",
      "Epoch: 280/300 Train Loss: 0.5796 Accuracy: 0.9642 Time: 13.48281  | Val Loss: 0.7096 Accuracy: 0.9256\n",
      "Epoch: 281/300 Train Loss: 0.5758 Accuracy: 0.9671 Time: 13.31624  | Val Loss: 0.7097 Accuracy: 0.9243\n",
      "Epoch: 282/300 Train Loss: 0.5773 Accuracy: 0.9651 Time: 13.59325  | Val Loss: 0.7125 Accuracy: 0.9225\n",
      "Epoch: 283/300 Train Loss: 0.5788 Accuracy: 0.9648 Time: 13.56255  | Val Loss: 0.7072 Accuracy: 0.9222\n",
      "Epoch: 284/300 Train Loss: 0.5798 Accuracy: 0.9643 Time: 13.46927  | Val Loss: 0.7075 Accuracy: 0.9222\n",
      "Epoch: 285/300 Train Loss: 0.5768 Accuracy: 0.9671 Time: 13.75527  | Val Loss: 0.7043 Accuracy: 0.9237\n",
      "Epoch: 286/300 Train Loss: 0.5776 Accuracy: 0.9657 Time: 13.42032  | Val Loss: 0.7057 Accuracy: 0.9235\n",
      "Epoch: 287/300 Train Loss: 0.5756 Accuracy: 0.9669 Time: 13.83839  | Val Loss: 0.7090 Accuracy: 0.9239\n",
      "Epoch: 288/300 Train Loss: 0.5793 Accuracy: 0.9653 Time: 13.59119  | Val Loss: 0.7053 Accuracy: 0.9246\n",
      "Epoch: 289/300 Train Loss: 0.5728 Accuracy: 0.9669 Time: 13.34926  | Val Loss: 0.7059 Accuracy: 0.9237\n",
      "Epoch: 290/300 Train Loss: 0.5797 Accuracy: 0.9647 Time: 13.46474  | Val Loss: 0.7042 Accuracy: 0.9226\n",
      "Epoch: 291/300 Train Loss: 0.5743 Accuracy: 0.9668 Time: 13.57380  | Val Loss: 0.7050 Accuracy: 0.9246\n",
      "Epoch: 292/300 Train Loss: 0.5726 Accuracy: 0.9673 Time: 13.78816  | Val Loss: 0.7031 Accuracy: 0.9238\n",
      "Epoch: 293/300 Train Loss: 0.5753 Accuracy: 0.9665 Time: 13.29660  | Val Loss: 0.7052 Accuracy: 0.9257\n",
      "Epoch: 294/300 Train Loss: 0.5755 Accuracy: 0.9662 Time: 13.67347  | Val Loss: 0.7045 Accuracy: 0.9236\n",
      "Epoch: 295/300 Train Loss: 0.5734 Accuracy: 0.9669 Time: 13.72689  | Val Loss: 0.7052 Accuracy: 0.9238\n",
      "Epoch: 296/300 Train Loss: 0.5763 Accuracy: 0.9664 Time: 13.47335  | Val Loss: 0.7044 Accuracy: 0.9249\n",
      "Epoch: 297/300 Train Loss: 0.5762 Accuracy: 0.9657 Time: 13.61619  | Val Loss: 0.7043 Accuracy: 0.9237\n",
      "Epoch: 298/300 Train Loss: 0.5761 Accuracy: 0.9665 Time: 13.62985  | Val Loss: 0.7042 Accuracy: 0.9241\n",
      "Epoch: 299/300 Train Loss: 0.5760 Accuracy: 0.9658 Time: 13.27047  | Val Loss: 0.7041 Accuracy: 0.9241\n",
      "Epoch: 300/300 Train Loss: 0.5743 Accuracy: 0.9670 Time: 13.42976  | Val Loss: 0.7030 Accuracy: 0.9236\n"
     ]
    }
   ],
   "source": [
    "from hdd.train.warmup_scheduler import GradualWarmupScheduler\n",
    "from hdd.models.transformer.swin_transformer import SwinTransformer\n",
    "from hdd.train.classification_utils import naive_train_classification_model\n",
    "from hdd.models.nn_utils import count_trainable_parameter\n",
    "\n",
    "\n",
    "net = SwinTransformer(\n",
    "    img_size=32,\n",
    "    patch_size=2,\n",
    "    in_chans=3,\n",
    "    num_classes=10,\n",
    "    embed_dim=54,\n",
    "    depths=[6, 6, 6],\n",
    "    num_heads=[6, 6, 6],\n",
    "    dropout=0.0,\n",
    "    window_size=2,\n",
    "    mlp_ratio=4,\n",
    ").to(DEVICE)\n",
    "print(f\"#Parameter: {count_trainable_parameter(net)}\")\n",
    "criteria = nn.CrossEntropyLoss(label_smoothing=0.1)\n",
    "optimizer = torch.optim.Adam(\n",
    "    net.parameters(), lr=1e-3, betas=(0.9, 0.999), weight_decay=1e-5\n",
    ")\n",
    "\n",
    "base_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(\n",
    "    optimizer, max_epochs, eta_min=1e-5\n",
    ")\n",
    "scheduler = GradualWarmupScheduler(\n",
    "    optimizer,\n",
    "    multiplier=1.0,\n",
    "    total_epoch=10,\n",
    "    after_scheduler=base_scheduler,\n",
    ")\n",
    "random_setting = naive_train_classification_model(\n",
    "    net,\n",
    "    criteria,\n",
    "    max_epochs,\n",
    "    train_dataloader,\n",
    "    val_dataloader,\n",
    "    DEVICE,\n",
    "    optimizer,\n",
    "    scheduler,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "095abb9b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pytorch-cu124",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
